Improving position-specific predictions of protein functional sites using phylogenetic motifs
نویسندگان
چکیده
MOTIVATION Accurate computational prediction of protein functional sites is critical to maximizing the utility of recent high-throughput sequencing efforts. Among the available approaches, position-specific conservation scores remain among the most popular due to their accuracy and ease of computation. Unfortunately, high false positive rates remain a limiting factor. Using phylogenetic motifs (PMs), we have developed two combined (conservation + PMs) prediction schemes that significantly improve prediction accuracy. RESULTS Our first approach, called position-specific MINER (psMINER), rank orders alignment columns by conservation. Subsequently, positions that are also not identified as PMs are excluded from the prediction set. This approach improves prediction accuracy, in a statistically significant way, compared to the underlying conservation scores. Increased accuracy is a general result, meaning improvement is observed over several different conservation scores that span a continuum of complexity. In addition, a hybrid MINER (hMINER) that quantitatively considers both scoring regimes provides further improvement. More importantly, it provides critical insight into the relative importance of phylogeny versus alignment conservation. Both methods outperform other common prediction algorithms that also utilize phylogenetic concepts. Finally, we demonstrate that the presented results are critically sensitive to functional site definition, thus highlighting the need for more complete benchmarks within the prediction community.
منابع مشابه
Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences
Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k - 1 act as ...
متن کاملDiscovery of regulatory elements by a computational method for phylogenetic footprinting.
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of orthologous regulatory regions from multiple species. It does so by identifying the best conserved motifs in those orthologous regions. We describe a computer algorithm designed specifically for this purpose, making use of the phylogenetic relationships among the sequences under study to make more accurat...
متن کاملPhylogenetic analysis of two Iranian grapevine virus A isolates using coat protein gene sequence
Symptomatic grapevine samples were collected from vineyards in Zanjan province to detect Grapevine virus A. Total RNA was extracted from symptomatic leaf samples and subjected to cDNA synthesis using random hexamer primers. Then, a DNA fragment around 800 bp including the complete coat protein (CP) gene was amplified from nine out of 57 samples by polymerase chain reaction (PCR) using specific ...
متن کاملFunctional Characterization of Variations on Regulatory Motifs
Transcription factors (TFs) regulate gene expression through specific interactions with short promoter elements. The same regulatory protein may recognize a variety of related sequences. Moreover, once they are detected it is hard to predict whether highly similar sequence motifs will be recognized by the same TF and regulate similar gene expression patterns, or serve as binding sites for disti...
متن کاملPositive and negative design for nonconsensus protein-DNA binding affinity in the vicinity of functional binding sites.
Recent experiments provide an unprecedented view of protein-DNA binding in yeast and human genomes at single-nucleotide resolution. These measurements, performed over large cell populations, show quite generally that sequence-specific transcription regulators with well-defined protein-DNA consensus motifs bind only a fraction among all consensus motifs present in the genome. Alternatively, prot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 24 20 شماره
صفحات -
تاریخ انتشار 2008